Self-evolving evaluation benchmarks research Internship
Self-evolving evaluation benchmarks research Internship
Cambridge
- Developing a self-evolving benchmarking framework, incorporating dynamic rubric criteria.
- Designing and implementing evidence-grounded scoring mechanisms, ensuring that model claims and reasoning steps are supported by verifiable traces, tool outputs, or retrieved evidence.
- Investigating robustness and anti-gaming strategies, including adversarial testing to detect behaviours where models optimize the score without improving real-world quality.
- Building lightweight benchmarking tools, following solid software engineering practices to ensure reproducibility, traceability, and modularity.
- Analyzing model behaviour across multiple scientific task families, such as protocol drafting, reasoning chains, and multi-agent planning, to assess the generality of the evolving benchmark.
- Collaborating with scientists to identify key failure modes, highvalue assessment signals, and opportunities to integrate the benchmarking framework into scientific workflows.
- Currently pursuing a PhD in computer science, machine learning, computational sciences, AI evaluation/robustness, or a related field.
- Strong experience with machine learning and deep learning methods, ideally including evaluation or alignment related work.
- Excellent Python programming skills; familiarity with frameworks such as PyTorch, JAX, or TensorFlow.
- Strong analytical mindset with enthusiasm for evaluation science, reliability, and AI governance
- Ability to work collaboratively in a teamenvironment and communicate scientific ideas effectively.
- Must be at least 18 years of age at time of application.
- Must have UK right-to-work status.
- Must return to schooling at program close (candidates graduating before/during the programmes are ineligible)
- Experience with benchmarking, evaluation rubrics, reinforcement learning from human/AI feedback, or model auditing.
- Familiarity with agentic AI systems, tool using models, multi-agent workflows, or long context reasoning analysis.
- Knowledge of rubric-based scoring, checklists, or structured evaluation frameworks.
- Experience with adversarial testing, generative model safety, or failure mode taxonomy development.
- Interest in applying evaluation science to scientific, biomedical, or protocol generation tasks.
Date Posted
30-Jan-2026Closing Date
13-Feb-2026Our mission is to build an inclusive and equitable environment. We want people to feel they belong at AstraZeneca and Alexion, starting with our recruitment process. We welcome and consider applications from all qualified candidates, regardless of characteristics. We offer reasonable adjustments/accommodations to help all candidates to perform at their best. If you have a need for any adjustments/accommodations, please complete the section in the application form.AstraZeneca embraces diversity and equality of opportunity. We are committed to building an inclusive and diverse team representing all backgrounds, with as wide a range of perspectives as possible, and harnessing industry-leading skills. We believe that the more inclusive we are, the better our work will be. We welcome and consider applications to join our team from all qualified candidates, regardless of their characteristics. We comply with all applicable laws and regulations on non-discrimination in employment (and recruitment), as well as work authorisation and employment eligibility verification requirements.
Join our Talent Network
Be the first to receive job updates and news from AstraZeneca
Sign up